Recently, very deep convolutional neural networks (CNNs) have shownoutstanding performance in object recognition and have also been the firstchoice for dense classification problems such as semantic segmentation.However, repeated subsampling operations like pooling or convolution stridingin deep CNNs lead to a significant decrease in the initial image resolution.Here, we present RefineNet, a generic multi-path refinement network thatexplicitly exploits all the information available along the down-samplingprocess to enable high-resolution prediction using long-range residualconnections. In this way, the deeper layers that capture high-level semanticfeatures can be directly refined using fine-grained features from earlierconvolutions. The individual components of RefineNet employ residualconnections following the identity mapping mindset, which allows for effectiveend-to-end training. Further, we introduce chained residual pooling, whichcaptures rich background context in an efficient manner. We carry outcomprehensive experiments and set new state-of-the-art results on seven publicdatasets. In particular, we achieve an intersection-over-union score of 83.4 onthe challenging PASCAL VOC 2012 dataset, which is the best reported result todate.
展开▼